-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[BUGFIX] Raise an error for no draft token case when draft_tp>1 #6369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUGFIX] Raise an error for no draft token case when draft_tp>1 #6369
Conversation
|
@cadedaniel @zifeitong @comaniac
|
b9af049 to
ad8390c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Can we enable the test and expect to catch this exception?
96ec39e to
02dc475
Compare
|
@comaniac Thanks for the review :) But I'm not sure, the CI seems to have stopped. |
|
@comaniac The CI test passed :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| if not self.allow_no_draft_tokens and sum( | ||
| proposals.proposal_lens) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worry about the overhead that sum brings, but I feel it's fine for now given that it won't be triggered with draft model TP=1. wdyt @cadedaniel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, is it possible to store this field in proposals when we create proposals? that way we don't need an additional CPU-GPU-CPU sync
| self.scorer_worker = scorer_worker | ||
| self.disable_by_batch_size = disable_by_batch_size or float("inf") | ||
| self.spec_decode_sampler = spec_decode_sampler | ||
| self.allow_no_draft_tokens = allow_zero_draft_token_step |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we mark this private, e.g. _allow_no_draft_tokens? we should have done this for all properties but we missed it
| if not self.allow_no_draft_tokens and sum( | ||
| proposals.proposal_lens) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, is it possible to store this field in proposals when we create proposals? that way we don't need an additional CPU-GPU-CPU sync
|
Thanks for the review! I'm gonna handle it right now :) |
2861e99 to
3876858
Compare
|
@comaniac Is there any problem in CI now? |
3876858 to
4b338d2
Compare
|
The failure you posted seems random. I'll monitor the current CI run and manually retry failed ones. |
|
@cadedaniel @comaniac I've updated the code as your suggestion and I think the code passed the test. @comaniac CI has finished. Would you retry the 'documentation-build' test? Thank you! |
|
@simon-mo can we get a force merge, doc build seems broken |
…-project#6369) Signed-off-by: Alvant <[email protected]>
…-project#6369) Signed-off-by: LeiWang1999 <[email protected]>
This PR adds a simple patch to raise an error to prevent users from hitting the hang error stated in #5814.
This error happens only when the skip speculation feature is activated and there are no generated draft tokens for "all" sequences in a step, and draft_tp > 1.
We may revisit this issue later because it's not resolved completely.